Importing master data that has been already cleaned.
setwd("/Users/mengxisun/Documents/Fall 2022/Data Science I/p8105-finalproject.github.io/")
data = read_csv(file = "./data/master.csv") %>%
janitor::clean_names()
## Rows: 61 Columns: 20
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): borough, region
## dbl (18): age_less_20, age_20_24, age_25_29, age_30_34, age_35_39, age_plus_...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
From the master spreadsheet, we selected the columns and rows we will be using to analyze induced abortions by covariate categories.
We first used pivot_longer to reformat the data and used
plotly to create interactive plots.
age =
data %>%
select(1:7) %>%
slice(3:7)
age %>%
pivot_longer(
age_less_20:age_plus_40,
names_to = "age",
values_to = "abortion"
) %>%
mutate(age = factor(age, levels = c("age_less_20", "age_20_24", "age_25_29", "age_30_34", "age_35_39", "age_plus_40"))) %>%
plot_ly(x = ~age, y = ~abortion, color = ~borough, type = "bar", colors = "viridis") %>%
layout(title = 'Abortion Ratios by Age for Boroughs in New York City', yaxis = list(title = 'Number of Induced Abortions per 1,000 Live Births'))
age %>% filter(age_less_20 == max(age_less_20))
## # A tibble: 1 × 7
## borough age_less_20 age_20_24 age_25_29 age_30_34 age_35_39 age_plus_40
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Manhattan 1965. 1769. 988. 312. 256. 312
From the bar plots, it is evident that those who age less than 20
have the highest abortion ratios across the boroughs in NYC in 2019.
These ratios eem to decrease as age categories increase. When comparing
between boroughs, Manhattan had the highest abortion ratios in
categories age_less_20 to age_25_29. Staten
Island had the lowest abortion ratios overall, except for category
age_20_24. The highest abortion ratio across all boroughs
and age categories was for Manhattan for age_less_20 with
1964.9 induced abortions per 1,000 live births for Manhattan. The lowest
abortion ratio across all boroughs and age categories was for Manhattan
for age_less_20 with 1964.9 induced abortions per 1,000
live births.
race =
data %>%
select(1,9:12) %>%
slice(3:7)
race %>%
pivot_longer(
nh_white_only_ratio:h_total,
names_to = "race",
values_to = "abortion"
) %>%
mutate(race = factor(race, levels = c("nh_white_only_ratio", "nh_black_only_ratio", "nh_other_ratio", "h_total"))) %>%
plot_ly(x = ~race, y = ~abortion, color = ~borough, type = "bar", colors = "viridis") %>%
layout(title = 'Abortion Ratios by Race for Boroughs in New York City', yaxis = list(title = 'Number of Induced Abortions per 1,000 Live Births'))
From the bar plots, it is evident that those who identified as Non-Hispanic Black had the highest abortion ratios across all boroughs in NYC. Among those who identified as Non_Hispanic_Black, ratios were highest in Manhattan with 1,228.3 induced abortions per 1,000 live births. Those who identified as Non-Hispanic White and Non-Hispanic Other had some of the lowest abortion ratios in NYC. The lowest ratio across all boroughs and race groups was for those who identified as Non-Hispanic White only and lived in Brooklyn with a ratio of 88.6 induced abortions per 1,000 live births.
financial_plan=
data %>%
select(1,15:18) %>%
slice(3:7)
financial_plan %>%
pivot_longer(
medicaid:not_stated,
names_to = "financial_plan",
values_to = "abortion"
) %>%
mutate(race = factor(financial_plan, levels = c("medicaid", "self_pay", "other_insurance", "not_stated"))) %>%
plot_ly(x = ~financial_plan, y = ~abortion, color = ~borough, type = "bar", colors = "viridis") %>%
layout(title = 'Abortion Ratios by Financial Plan for Boroughs in New York City', yaxis = list(title = 'Number of Induced Abortions per 1,000 Live Births'))
From the bar plots, those who were categorized as having
other_insurance had the highest abortion ratios across all financial
plans in NYC. When calculating abortion ratios for financial plans, we
manually calculated these ratios from (induced abortions/ live
births)*1000 as they were not provided like in other data sets. A huge
flaw in these numbers is that those who have induced abortions and those
who give births tend to use different insurance plans. People who give
birth are unlikely to use other_insurance in comparison to
those who have induced abortions due to the inherent differnces in
procedures. Those who used self_pay for induced abortions
had a relatively high abortion ratio as well. This would make sense as
well given the nature of the procedure and possible restrictions by the
health insurance plan to cover abortions.
age_total =
data %>%
select(1:7) %>%
slice(1:2)
age_total %>%
pivot_longer(
age_less_20:age_plus_40,
names_to = "age",
values_to = "abortion"
) %>%
mutate(age = factor(age, levels = c("age_less_20", "age_20_24", "age_25_29", "age_30_34", "age_35_39", "age_plus_40"))) %>%
plot_ly(x = ~age, y = ~abortion, color = ~borough, type = "bar", colors = "viridis") %>%
layout(title = 'Abortion Ratios by Age for Boroughs in New York City', yaxis = list(title = 'Number of Induced Abortions per 1,000 Live Births'))
From the bar plots, it is evident that those in New York City have a
higher abortion ratio compared to those in New York State across all age
categories. Generally, these ratios seem to decrease as age categories
increase.Ratios were highest among those in the age_less_20
category for both New York City and New York State.
race_total =
data %>%
select(1,9:12) %>%
slice(1:2)
race_total %>%
pivot_longer(
nh_white_only_ratio:h_total,
names_to = "race",
values_to = "abortion"
) %>%
mutate(race = factor(race, levels = c("nh_white_only_ratio", "nh_black_only_ratio", "nh_other_ratio", "h_total"))) %>%
plot_ly(x = ~race, y = ~abortion, color = ~borough, type = "bar", colors = "viridis") %>%
layout(title = 'Abortion Ratios by Race for Boroughs in New York City', yaxis = list(title = 'Number of Induced Abortions per 1,000 Live Births'))
From the bar plots, it is evident that those who identified as Non-Hispanic Black had the highest abortion ratios across all boroughs in NYC. Among those who identified as Non_Hispanic_Black, ratios were highest in Manhattan with 1,228.3 induced abortions per 1,000 live births. Those who identified as Non-Hispanic White and Non-Hispanic Other had some of the lowest abortion ratios in NYC. The lowest ratio across all boroughs and race groups was for those who identified as Non-Hispanic White only and lived in Brooklyn with a ratio of 88.6 induced abortions per 1,000 live births.
financial_plan_total=
data %>%
select(1,15:18) %>%
slice(1:2)
financial_plan_total %>%
pivot_longer(
medicaid:not_stated,
names_to = "financial_plan",
values_to = "abortion"
) %>%
mutate(race = factor(financial_plan, levels = c("medicaid", "self_pay", "other_insurance", "not_stated"))) %>%
plot_ly(x = ~financial_plan, y = ~abortion, color = ~borough, type = "bar", colors = "viridis") %>%
layout(title = 'Abortion Ratios by Financial Plan for Boroughs in New York City', yaxis = list(title = 'Number of Induced Abortions per 1,000 Live Births'))